InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

نویسندگان

چکیده

Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. However, most existing methods suffer from producing repetitive topics that hinder further and performance decline caused low-coverage dictionaries. In this paper, we propose the Topic Modeling with Mutual Information (InfoCTM). Instead of direct alignment in previous work, a mutual information method. This works as regularization to properly align prevent degenerate representations words, which mitigates issue. To address dictionary issue, vocabulary linking method finds more linked words beyond translations given dictionary. Extensive experiments on English, Chinese, Japanese datasets demonstrate our outperforms state-of-the-art baselines, coherent, diverse, well-aligned showing better transferability classification tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Lingual Latent Topic Extraction

Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. In this paper, we propose a way to ...

متن کامل

Cross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆

How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....

متن کامل

Alignment by Maximization of Mutual Information Alignment B Y Maximization of Mutual Information

A new information-theoretic approach is presented for nding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can ...

متن کامل

Multilingual and cross-lingual news topic tracking

We are presenting a working system for automated news analysis that ingests an average total of 7600 news articles per day in five languages. For each language, the system detects the major news stories of the day using a group-average unsupervised agglomerative clustering process. It also tracks, for each cluster, related groups of articles published over the previous seven days, using a cosin...

متن کامل

Statistical mechanics of mutual information maximization

– An unsupervised learning procedure based on maximizing the mutual information between the outputs of two networks receiving different but statistically dependent inputs is analyzed (Becker S. and Hinton G., Nature, 355 (1992) 161). By exploiting a formal analogy to supervised learning in parity machines, the theory of zero-temperature Gibbs learning for the unsupervised procedure is presented...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26612